Canonical Huffman code について

Words near each other

・ Canonical commutation relation
・ Canonical connection
・ Canonical coordinates
・ Canonical coronation
・ Canonical correlation
・ Canonical correspondence analysis
・ Canonical cover
・ Canonical criticism
・ Canonical domain
・ Canonical election
・ Canonical ensemble
・ Canonical erection of a house of religious
・ Canonical faculties
・ Canonical form
・ Canonical hours
・ Canonical Huffman code
・ Canonical impediment
・ Canonical Inquisition
・ Canonical institution
・ Canonical link element
・ Canonical LR parser
・ Canonical map
・ Canonical model
・ Canonical model (disambiguation)
・ Canonical normal form
・ Canonical Old Roman Catholic Church
・ Canonical order
・ Canonical protocol pattern
・ Canonical provision
・ Canonical quantization

Dictionary Lists

mini英和辞書

翻訳と辞書　辞書検索 [ 開発暫定版 ]

スポンサードリンク

Canonical Huffman code ：ウィキペディア英語版

Canonical Huffman code

A canonical Huffman code is a particular type of Huffman code with unique properties which allow it to be described in a very compact manner.
Data compressors generally work in one of two ways. Either the decompressor can infer what codebook the compressor has used from previous context, or the compressor must tell the decompressor what the codebook is. Since a canonical Huffman codebook can be stored especially efficiently, most compressors start by generating a "normal" Huffman codebook, and then convert it to canonical Huffman before using it.
In order for a symbol code scheme such as the Huffman code to be decompressed, the same model that the encoding algorithm used to compress the source data must be provided to the decoding algorithm so that it can use it to decompress the encoded data. In standard Huffman coding this model takes the form of a tree of variable-length codes, with the most frequent symbols located at the top of the structure and being represented by the fewest number of bits.
However, this code tree introduces two critical inefficiencies into an implementation of the coding scheme. Firstly, each node of the tree must store either references to its child nodes or the symbol that it represents. This is expensive in memory usage and if there is a high proportion of unique symbols in the source data then the size of the code tree can account for a significant amount of the overall encoded data. Secondly, traversing the tree is computationally costly, since it requires the algorithm to jump randomly through the structure in memory as each bit in the encoded data is read in.
Canonical Huffman codes address these two issues by generating the codes in a clear standardized format; all the codes for a given length are assigned their values sequentially. This means that instead of storing the structure of the code tree for decompression only the lengths of the codes are required, reducing the size of the encoded data. Additionally, because the codes are sequential, the decoding algorithm can be dramatically simplified so that it is computationally efficient.
==Algorithm==
The normal Huffman coding algorithm assigns a variable length code to every symbol in the alphabet. More frequently used symbols will be assigned a shorter code. For example, suppose we have the following ''non''-canonical codebook:
A = 11
B = 0
C = 101
D = 100
Here the letter A has been assigned 2 bits, B has 1 bit, and C and D both have 3 bits. To make the code a ''canonical'' Huffman code, the codes are renumbered. The bit lengths stay the same with the code book being sorted ''first'' by codeword length and ''secondly'' by alphabetical value:
B = 0
A = 11
C = 101
D = 100
Each of the existing codes are replaced with a new one of the same length, using the following algorithm:
* The ''first'' symbol in the list gets assigned a codeword which is the same length as the symbol's original codeword but all zeros. This will often be a single zero ('0').
* Each subsequent symbol is assigned the next binary number in sequence, ensuring that following codes are always higher in value.
* When you reach a longer codeword, then ''after'' incrementing, append zeros until the length of the new codeword is equal to the length of the old codeword. This can be thought of as a left shift.
By following these three rules, the ''canonical'' version of the code book produced will be:
B = 0
A = 10
C = 110
D = 111

抄文引用元・出典: フリー百科事典『ウィキペディア（Wikipedia）』
■ウィキペディアで「Canonical Huffman code」の詳細全文を読む

スポンサードリンク

翻訳と辞書 : 翻訳のためのインターネットリソース